Overview

Dataset statistics

Number of variables8
Number of observations500
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.4 MiB
Average record size in memory2.8 KiB

Variable types

Numeric7
Categorical1

Alerts

original_title has a high cardinality: 500 distinct values High cardinality
popularity is highly correlated with budget and 3 other fieldsHigh correlation
revenue is highly correlated with budget and 2 other fieldsHigh correlation
vote_average is highly correlated with popularity and 1 other fieldsHigh correlation
vote_count is highly correlated with popularity and 2 other fieldsHigh correlation
df_index is highly correlated with release_yearHigh correlation
budget is highly correlated with popularity and 1 other fieldsHigh correlation
release_year is highly correlated with df_indexHigh correlation
original_title is uniformly distributed Uniform
df_index has unique values Unique
original_title has unique values Unique
popularity has unique values Unique
revenue has unique values Unique

Reproduction

Analysis started2022-10-19 09:54:12.476205
Analysis finished2022-10-19 09:54:21.565601
Duration9.09 seconds
Software versionpandas-profiling v3.3.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct500
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2030.428
Minimum0
Maximum14383
Zeros1
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-10-19T12:54:21.678176image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile119.4
Q1649
median1639.5
Q32646.25
95-th percentile6663.5
Maximum14383
Range14383
Interquartile range (IQR)1997.25

Descriptive statistics

Standard deviation1935.169859
Coefficient of variation (CV)0.9530846987
Kurtosis8.166330218
Mean2030.428
Median Absolute Deviation (MAD)999
Skewness2.33464605
Sum1015214
Variance3744882.382
MonotonicityNot monotonic
2022-10-19T12:54:21.804808image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
16391
 
0.2%
20151
 
0.2%
36771
 
0.2%
22231
 
0.2%
37241
 
0.2%
66481
 
0.2%
70751
 
0.2%
17741
 
0.2%
15371
 
0.2%
22411
 
0.2%
Other values (490)490
98.0%
ValueCountFrequency (%)
01
0.2%
11
0.2%
51
0.2%
81
0.2%
91
0.2%
101
0.2%
131
0.2%
141
0.2%
151
0.2%
181
0.2%
ValueCountFrequency (%)
143831
0.2%
140451
0.2%
100981
0.2%
96431
0.2%
91181
0.2%
89431
0.2%
82781
0.2%
81781
0.2%
79931
0.2%
78591
0.2%

budget
Real number (ℝ≥0)

HIGH CORRELATION

Distinct80
Distinct (%)16.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47689593.1
Minimum19000000
Maximum200000000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2022-10-19T12:54:22.087146image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum19000000
5-th percentile20000000
Q127000000
median40000000
Q360000000
95-th percentile95150000
Maximum200000000
Range181000000
Interquartile range (IQR)33000000

Descriptive statistics

Standard deviation26749694.59
Coefficient of variation (CV)0.5609126194
Kurtosis4.718904466
Mean47689593.1
Median Absolute Deviation (MAD)15000000
Skewness1.78498463
Sum2.384479655 × 1010
Variance7.155461606 × 1014
MonotonicityDecreasing
2022-10-19T12:54:22.213628image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2500000036
 
7.2%
3000000031
 
6.2%
5000000028
 
5.6%
4000000027
 
5.4%
2000000025
 
5.0%
6000000023
 
4.6%
3500000022
 
4.4%
4500000018
 
3.6%
7000000017
 
3.4%
8000000015
 
3.0%
Other values (70)258
51.6%
ValueCountFrequency (%)
190000005
 
1.0%
198855521
 
0.2%
2000000025
5.0%
210000004
 
0.8%
2200000012
 
2.4%
2300000012
 
2.4%
240000009
 
1.8%
2500000036
7.2%
255300001
 
0.2%
260000008
 
1.6%
ValueCountFrequency (%)
2000000001
0.2%
1750000001
0.2%
1700000001
0.2%
1600000001
0.2%
1500000001
0.2%
1400000002
0.4%
1350000001
0.2%
1330000001
0.2%
1300000001
0.2%
1250000001
0.2%

original_title
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct500
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
Jurassic Park
 
1
Brokedown Palace
 
1
Space Truckers
 
1
Ghost
 
1
The Rock
 
1
Other values (495)
495 

Length

Max length55
Median length35
Mean length14.384
Min length3

Characters and Unicode

Total characters7192
Distinct characters102
Distinct categories9 ?
Distinct scripts5 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique500 ?
Unique (%)100.0%

Sample

1st rowTitanic
2nd rowWaterworld
3rd rowWild Wild West
4th rowThe 13th Warrior
5th rowTarzan

Common Values

ValueCountFrequency (%)
Jurassic Park1
 
0.2%
Brokedown Palace1
 
0.2%
Space Truckers1
 
0.2%
Ghost1
 
0.2%
The Rock1
 
0.2%
Another 48 Hrs.1
 
0.2%
8MM1
 
0.2%
The Long Kiss Goodnight1
 
0.2%
The Hunchback of Notre Dame1
 
0.2%
How Stella Got Her Groove Back1
 
0.2%
Other values (490)490
98.0%

Length

2022-10-19T12:54:22.349048image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the156
 
12.1%
of36
 
2.8%
in17
 
1.3%
a15
 
1.2%
210
 
0.8%
and10
 
0.8%
man9
 
0.7%
i6
 
0.5%
city6
 
0.5%
world6
 
0.5%
Other values (808)1018
79.0%

Most occurring characters

ValueCountFrequency (%)
789
 
11.0%
e759
 
10.6%
a458
 
6.4%
r408
 
5.7%
n388
 
5.4%
o383
 
5.3%
t382
 
5.3%
i372
 
5.2%
s285
 
4.0%
h274
 
3.8%
Other values (92)2694
37.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5080
70.6%
Uppercase Letter1158
 
16.1%
Space Separator789
 
11.0%
Other Punctuation71
 
1.0%
Other Letter45
 
0.6%
Decimal Number40
 
0.6%
Dash Punctuation3
 
< 0.1%
Modifier Letter3
 
< 0.1%
Other Number3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e759
14.9%
a458
9.0%
r408
 
8.0%
n388
 
7.6%
o383
 
7.5%
t382
 
7.5%
i372
 
7.3%
s285
 
5.6%
h274
 
5.4%
l235
 
4.6%
Other values (17)1136
22.4%
Other Letter
ValueCountFrequency (%)
4
 
8.9%
3
 
6.7%
3
 
6.7%
3
 
6.7%
3
 
6.7%
2
 
4.4%
2
 
4.4%
2
 
4.4%
2
 
4.4%
2
 
4.4%
Other values (17)19
42.2%
Uppercase Letter
ValueCountFrequency (%)
T178
15.4%
S104
 
9.0%
M82
 
7.1%
A69
 
6.0%
B68
 
5.9%
D66
 
5.7%
P60
 
5.2%
C59
 
5.1%
F54
 
4.7%
I53
 
4.6%
Other values (16)365
31.5%
Other Punctuation
ValueCountFrequency (%)
:30
42.3%
'17
23.9%
.16
22.5%
!3
 
4.2%
&2
 
2.8%
?1
 
1.4%
/1
 
1.4%
,1
 
1.4%
Decimal Number
ValueCountFrequency (%)
214
35.0%
07
17.5%
37
17.5%
15
 
12.5%
43
 
7.5%
82
 
5.0%
91
 
2.5%
71
 
2.5%
Other Number
ValueCountFrequency (%)
³1
33.3%
1
33.3%
½1
33.3%
Space Separator
ValueCountFrequency (%)
789
100.0%
Dash Punctuation
ValueCountFrequency (%)
-3
100.0%
Modifier Letter
ValueCountFrequency (%)
3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6238
86.7%
Common909
 
12.6%
Katakana27
 
0.4%
Han12
 
0.2%
Hiragana6
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e759
 
12.2%
a458
 
7.3%
r408
 
6.5%
n388
 
6.2%
o383
 
6.1%
t382
 
6.1%
i372
 
6.0%
s285
 
4.6%
h274
 
4.4%
l235
 
3.8%
Other values (43)2294
36.8%
Common
ValueCountFrequency (%)
789
86.8%
:30
 
3.3%
'17
 
1.9%
.16
 
1.8%
214
 
1.5%
07
 
0.8%
37
 
0.8%
15
 
0.6%
!3
 
0.3%
-3
 
0.3%
Other values (12)18
 
2.0%
Katakana
ValueCountFrequency (%)
3
11.1%
3
11.1%
3
11.1%
3
11.1%
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
1
 
3.7%
1
 
3.7%
Other values (5)5
18.5%
Han
ValueCountFrequency (%)
2
16.7%
2
16.7%
2
16.7%
1
8.3%
1
8.3%
1
8.3%
1
8.3%
1
8.3%
1
8.3%
Hiragana
ValueCountFrequency (%)
4
66.7%
1
 
16.7%
1
 
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII7140
99.3%
Katakana30
 
0.4%
CJK12
 
0.2%
Hiragana6
 
0.1%
None3
 
< 0.1%
Number Forms1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
789
 
11.1%
e759
 
10.6%
a458
 
6.4%
r408
 
5.7%
n388
 
5.4%
o383
 
5.4%
t382
 
5.4%
i372
 
5.2%
s285
 
4.0%
h274
 
3.8%
Other values (60)2642
37.0%
Hiragana
ValueCountFrequency (%)
4
66.7%
1
 
16.7%
1
 
16.7%
Katakana
ValueCountFrequency (%)
3
10.0%
3
10.0%
3
10.0%
3
10.0%
3
10.0%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
1
 
3.3%
Other values (6)6
20.0%
CJK
ValueCountFrequency (%)
2
16.7%
2
16.7%
2
16.7%
1
8.3%
1
8.3%
1
8.3%
1
8.3%
1
8.3%
1
8.3%
None
ValueCountFrequency (%)
³1
33.3%
½1
33.3%
è1
33.3%
Number Forms
ValueCountFrequency (%)
1
100.0%

popularity
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct500
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.50606286
Minimum0.788123
Maximum63.869599
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-10-19T12:54:22.465533image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0.788123
5-th percentile3.4057613
Q17.1689905
median9.9880715
Q312.87288425
95-th percentile17.93653895
Maximum63.869599
Range63.081476
Interquartile range (IQR)5.70389375

Descriptive statistics

Standard deviation5.911610257
Coefficient of variation (CV)0.5626855976
Kurtosis22.7435647
Mean10.50606286
Median Absolute Deviation (MAD)2.8636245
Skewness3.415252715
Sum5253.031428
Variance34.94713583
MonotonicityNot monotonic
2022-10-19T12:54:22.600465image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
26.889071
 
0.2%
12.9625251
 
0.2%
7.5548481
 
0.2%
7.5663211
 
0.2%
3.0342441
 
0.2%
5.5163161
 
0.2%
11.7818951
 
0.2%
3.3853421
 
0.2%
15.911261
 
0.2%
5.1543731
 
0.2%
Other values (490)490
98.0%
ValueCountFrequency (%)
0.7881231
0.2%
0.9871961
0.2%
1.4081761
0.2%
1.4664611
0.2%
1.5624711
0.2%
1.6907681
0.2%
1.9148811
0.2%
2.1514361
0.2%
2.2635841
0.2%
2.3049861
0.2%
ValueCountFrequency (%)
63.8695991
0.2%
51.6454031
0.2%
48.3071941
0.2%
41.7251231
0.2%
39.394971
0.2%
33.3663321
0.2%
26.889071
0.2%
24.305261
0.2%
23.9840651
0.2%
23.636591
0.2%

release_year
Real number (ℝ≥0)

HIGH CORRELATION

Distinct10
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1995.418
Minimum1990
Maximum1999
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2022-10-19T12:54:22.712535image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1990
5-th percentile1990
Q11993
median1996
Q31998
95-th percentile1999
Maximum1999
Range9
Interquartile range (IQR)5

Descriptive statistics

Standard deviation2.783301913
Coefficient of variation (CV)0.00139484655
Kurtosis-0.9469485018
Mean1995.418
Median Absolute Deviation (MAD)2
Skewness-0.4463361136
Sum997709
Variance7.746769539
MonotonicityNot monotonic
2022-10-19T12:54:22.813578image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
199976
15.2%
199770
14.0%
199868
13.6%
199660
12.0%
199551
10.2%
199443
8.6%
199336
7.2%
199134
6.8%
199231
6.2%
199031
6.2%
ValueCountFrequency (%)
199031
6.2%
199134
6.8%
199231
6.2%
199336
7.2%
199443
8.6%
199551
10.2%
199660
12.0%
199770
14.0%
199868
13.6%
199976
15.2%
ValueCountFrequency (%)
199976
15.2%
199868
13.6%
199770
14.0%
199660
12.0%
199551
10.2%
199443
8.6%
199336
7.2%
199231
6.2%
199134
6.8%
199031
6.2%

revenue
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct500
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean128721634.6
Minimum71368
Maximum1845034240
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2022-10-19T12:54:22.935764image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum71368
5-th percentile7400646
Q122594055.5
median75252928
Q3177995820
95-th percentile377427152
Maximum1845034240
Range1844962872
Interquartile range (IQR)155401764.5

Descriptive statistics

Standard deviation158997546.7
Coefficient of variation (CV)1.235204534
Kurtosis30.11120029
Mean128721634.6
Median Absolute Deviation (MAD)61024417.5
Skewness3.926900804
Sum6.436081731 × 1010
Variance2.528021987 × 1016
MonotonicityNot monotonic
2022-10-19T12:54:23.062654image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18450342401
 
0.2%
3788824001
 
0.2%
1711203361
 
0.2%
2221046881
 
0.2%
616989001
 
0.2%
4480000001
 
0.2%
2854446081
 
0.2%
5537995521
 
0.2%
3618323841
 
0.2%
3001353601
 
0.2%
Other values (490)490
98.0%
ValueCountFrequency (%)
713681
0.2%
3050701
0.2%
6350961
0.2%
7774231
0.2%
7918301
0.2%
10600561
0.2%
13459031
0.2%
15312511
0.2%
16142661
0.2%
20750841
0.2%
ValueCountFrequency (%)
18450342401
0.2%
9243175681
0.2%
9200999681
0.2%
8169692801
0.2%
7882417921
0.2%
6779454081
0.2%
6728062721
0.2%
5893905281
0.2%
5537995521
0.2%
5200000001
0.2%

vote_average
Real number (ℝ≥0)

HIGH CORRELATION

Distinct43
Distinct (%)8.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.256398437
Minimum4.19921875
Maximum8.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.1 KiB
2022-10-19T12:54:23.188586image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum4.19921875
5-th percentile5
Q15.69921875
median6.30078125
Q36.80078125
95-th percentile7.6015625
Maximum8.5
Range4.30078125
Interquartile range (IQR)1.1015625

Descriptive statistics

Standard deviation0.8041992188
Coefficient of variation (CV)0.1285402819
Kurtosis-0.01870727539
Mean6.256398437
Median Absolute Deviation (MAD)0.5
Skewness-0.002069473267
Sum3128.199219
Variance0.6469726562
MonotonicityNot monotonic
2022-10-19T12:54:23.309836image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=43)
ValueCountFrequency (%)
6.101562530
 
6.0%
6.398437529
 
5.8%
6.528
 
5.6%
6.1992187527
 
5.4%
6.3007812525
 
5.0%
6.6992187525
 
5.0%
5.898437523
 
4.6%
6.8007812521
 
4.2%
5.8007812520
 
4.0%
620
 
4.0%
Other values (33)252
50.4%
ValueCountFrequency (%)
4.199218753
 
0.6%
4.300781251
 
0.2%
4.39843754
 
0.8%
4.55
1.0%
4.60156252
 
0.4%
4.699218753
 
0.6%
4.800781253
 
0.6%
4.89843753
 
0.6%
510
2.0%
5.101562511
2.2%
ValueCountFrequency (%)
8.51
 
0.2%
8.2968753
0.6%
8.2031255
1.0%
8.10156251
 
0.2%
81
 
0.2%
7.89843752
 
0.4%
7.800781252
 
0.4%
7.699218757
1.4%
7.60156256
1.2%
7.57
1.4%

vote_count
Real number (ℝ≥0)

HIGH CORRELATION

Distinct408
Distinct (%)81.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean762.288
Minimum11
Maximum9680
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2022-10-19T12:54:23.431401image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum11
5-th percentile32.95
Q1140
median361.5
Q3839.25
95-th percentile3124.6
Maximum9680
Range9669
Interquartile range (IQR)699.25

Descriptive statistics

Standard deviation1224.90015
Coefficient of variation (CV)1.60687319
Kurtosis18.7057271
Mean762.288
Median Absolute Deviation (MAD)270.5
Skewness3.859023631
Sum381144
Variance1500380.378
MonotonicityNot monotonic
2022-10-19T12:54:23.557554image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
494
 
0.8%
1284
 
0.8%
863
 
0.6%
5223
 
0.6%
1403
 
0.6%
523
 
0.6%
3813
 
0.6%
203
 
0.6%
263
 
0.6%
3003
 
0.6%
Other values (398)468
93.6%
ValueCountFrequency (%)
111
 
0.2%
121
 
0.2%
132
0.4%
173
0.6%
203
0.6%
211
 
0.2%
231
 
0.2%
242
0.4%
252
0.4%
263
0.6%
ValueCountFrequency (%)
96801
0.2%
90801
0.2%
83601
0.2%
81481
0.2%
77681
0.2%
59161
0.2%
55201
0.2%
54161
0.2%
51481
0.2%
49561
0.2%

Interactions

2022-10-19T12:54:20.608479image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:15.809566image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:16.670218image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:17.438568image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:18.191537image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:18.935782image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:19.866776image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:20.717871image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:15.933589image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:16.782146image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:17.550116image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:18.298185image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:19.211089image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:19.973527image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:20.827567image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:16.043220image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:16.895172image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:17.658198image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:18.406209image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:19.326155image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:20.081456image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:20.934499image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:16.229948image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:17.005156image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:17.766624image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:18.514279image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:19.435179image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:20.187860image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:21.039541image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:16.341135image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:17.113180image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:17.870403image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:18.618337image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:19.541203image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:20.292882image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:21.149529image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:16.454170image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:17.224121image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:17.981428image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:18.725229image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:19.651228image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:20.401612image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:21.257600image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:16.562252image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:17.330674image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:18.085451image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:18.828907image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:19.757522image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T12:54:20.504538image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2022-10-19T12:54:23.666893image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-10-19T12:54:23.789344image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-10-19T12:54:23.914788image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-10-19T12:54:24.208466image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-10-19T12:54:21.401110image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-10-19T12:54:21.527761image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexbudgetoriginal_titlepopularityrelease_yearrevenuevote_averagevote_count
01639200000000Titanic26.889070199718450342407.5000007768
1205175000000Waterworld16.88518419952642182245.8984381017
22586170000000Wild Wild West9.88760219992221046885.1015621042
32711160000000The 13th Warrior10.3080261999616989006.398438524
42572150000000Tarzan12.45345219994480000007.1015621715
51809140000000Lethal Weapon 414.47055119982854446086.300781782
61808140000000Armageddon13.23511219985537995526.5000002540
72965135000000The World Is Not Enough12.13012719993618323846.000000878
83040133000000Stuart Little8.35950019993001353605.800781998
91773130000000Godzilla11.29512119983790143045.3007811075

Last rows

df_indexbudgetoriginal_titlepopularityrelease_yearrevenuevote_averagevote_count
490337820000000Misery15.0208451990612768727.6015621085
49145020000000Free Willy5.99057219931536986246.000000429
492343520000000Jennifer Eight4.6209361992113904795.80078174
493313520000000Wayne's World10.18077619921216973206.500000738
494827819885552Fire in the Sky6.1182281993197243346.500000128
4956919000000From Dusk Till Dawn15.3391531996258366166.8984381644
496666119000000Sleeping with the Enemy8.56069419911749990086.101562228
49714219000000Bad Boys9.26218419951414070246.5000001729
498152119000000Picture Perfect4.3470791997443320165.101562114
499720019000000Clifford2.459293199474116595.30078117